fix(alerts): Don't group on time #89447

ceorourke · 2025-04-11T23:02:29Z

Issue alerts that have "slow" conditions over a large time period (1 day, 1 week, 1 month) can hit the 10,000 limit we have for results when there are a lot of group to process as we return timestamped data per group (e.g. group id 1 has data for 12:01, 12:02, 12:03, and so on and that's multiplied by the number of groups we're querying) and we'll drop the remaining data past the limit which results in lower numbers than we'd expect. This can cause alerts to not fire as the data we're receiving isn't accurate.

To fix this we'll disable group_on_time and instead of receiving all the timestamp data that we add up later, we'll simply receive the sum without the timestamps.

I renamed get_sums to get_timeseries_sums and replaced the usages that rely on the timeseries data with that. The places that use get_sums now expect to receive the aggregate data in a simple dictionary {group_id: count}.

ceorourke · 2025-04-11T23:42:44Z

src/sentry/rules/conditions/event_frequency.py

-            referrer_suffix=referrer_suffix,
-        )
+        result: Mapping[int, int] = {}
+        if tsdb_function == tsdb.get_sums:


I didn't want to add group_on_time to the other tsdb functions we may pass since they don't use it anyway, but this feels a little hacky.

One option could be to use functools.partial(tsdb.get_sums, group_on_time=True) when you pass it, instead of passing it through here.

I think either that, or adding group_on_time as a param to all funcs is the right move

wedamija

You'll need to do this for get_distinct_counts_totals too

wedamija · 2025-04-14T18:42:19Z

src/sentry/rules/conditions/event_frequency.py

-            referrer_suffix=referrer_suffix,
-        )
+        result: Mapping[int, int] = {}
+        if tsdb_function == tsdb.get_sums:


One option could be to use functools.partial(tsdb.get_sums, group_on_time=True) when you pass it, instead of passing it through here.

I think either that, or adding group_on_time as a param to all funcs is the right move

wedamija · 2025-04-14T18:48:46Z

tests/snuba/rules/conditions/test_event_frequency.py

+            group_1_id: 5,
+            group_2_id: 5,


Why are the counts 5 here, when we created only 4 events per group?

Oh this is because the base setup creates events with the same fingerprint, I'll update the fingerprint to be different

codecov · 2025-04-14T22:10:28Z

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #89447   +/-   ##
=======================================
  Coverage   87.71%   87.71%           
=======================================
  Files       10180    10180           
  Lines      574508   574541   +33     
  Branches    22632    22632           
=======================================
+ Hits       503932   503969   +37     
+ Misses      70148    70144    -4     
  Partials      428      428

wedamija

lgtm, let's merge it when we can be around to watch the deploy though

Issue alerts that have "slow" conditions over a large time period (1 day, 1 week, 1 month) can hit the 10,000 limit we have for results when there are a lot of group to process as we return timestamped data per group (e.g. group id 1 has data for 12:01, 12:02, 12:03, and so on and that's multiplied by the number of groups we're querying) and we'll drop the remaining data past the limit which results in lower numbers than we'd expect. This can cause alerts to not fire as the data we're receiving isn't accurate. To fix this we'll disable `group_on_time` and instead of receiving all the timestamp data that we add up later, we'll simply receive the sum without the timestamps. I renamed `get_sums` to `get_timeseries_sums` and replaced the usages that rely on the timeseries data with that. The places that use `get_sums` now expect to receive the aggregate data in a simple dictionary `{group_id: count}`.

sentry-io · 2025-04-23T12:35:58Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ SnubaError: HTTPConnectionPool(host='snuba-api', port=80): Read timed out. (read timeout=30) sentry.tasks.digests.deliver_digest View Issue
‼️ RateLimitExceeded: Query on could not be run due to allocation policies, info: {'details': {'ConcurrentRateLimitAllocationPolicy': {'can_run': False, 'max_threads': 0, 'explanation': {'reason': 'concurrent policy 17 exceeds limit of 16', 'overrides': {}, 'storage_key': 'S... sentry.tasks.digests.deliver_digest View Issue

_{Did you find this useful? React with a 👍 or 👎}

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 11, 2025

vercel bot deployed to Preview April 11, 2025 23:05 View deployment

vercel bot deployed to Preview April 11, 2025 23:14 View deployment

vercel bot deployed to Preview April 11, 2025 23:27 View deployment

vercel bot deployed to Preview April 11, 2025 23:33 View deployment

vercel bot deployed to Preview April 11, 2025 23:40 View deployment

ceorourke commented Apr 11, 2025

View reviewed changes

wedamija reviewed Apr 14, 2025

View reviewed changes

vercel bot deployed to Preview April 14, 2025 21:37 View deployment

vercel bot deployed to Preview April 14, 2025 21:42 View deployment

vercel bot deployed to Preview April 14, 2025 22:01 View deployment

vercel bot deployed to Preview April 14, 2025 22:35 View deployment

vercel bot deployed to Preview April 15, 2025 00:21 View deployment

vercel bot deployed to Preview April 15, 2025 18:26 View deployment

vercel bot deployed to Preview April 15, 2025 19:09 View deployment

vercel bot deployed to Preview April 15, 2025 19:15 View deployment

vercel bot deployed to Preview April 15, 2025 20:44 View deployment

vercel bot deployed to Preview April 15, 2025 21:39 View deployment

ceorourke added 10 commits April 15, 2025 14:56

fix(alerts): Don't group on time

aea7154

fix(alerts): Don't group on time

cf83737

oops fix that

f1ce024

fix typing, update test

4737819

specify assert

2b23ccb

wiggle the times

6fbf88a

fix typing

59a32a3

pass group_on_time in more places

f392289

dont group on time for get_distinct_counts_totals

a78acf8

make tests more clear

53b3b7e

ceorourke added 11 commits April 15, 2025 14:56

missed this

841da63

fip that

4479da6

desperate attempts to fix typing

9d26612

create get_timeseries_sums and get_sums

a142ae5

fix up typing

0cad727

update groupsnooze tests

9b35fad

flip to false

9abe223

typing

9a642f5

fix get_range

a470694

copying cathy's changes

3c4382e

missed one

53ca331

vercel bot deployed to Preview April 15, 2025 21:56 View deployment

ceorourke force-pushed the ceorourke/dont-group-on-time-tsdb branch from d21577b to 53ca331 Compare April 15, 2025 21:56

vercel bot deployed to Preview April 15, 2025 22:02 View deployment

create get_aggregate_function to avoid repetition

06d91ba

vercel bot deployed to Preview April 15, 2025 22:54 View deployment

ceorourke marked this pull request as ready for review April 15, 2025 22:59

ceorourke requested review from a team as code owners April 15, 2025 22:59

ceorourke requested a review from wedamija April 15, 2025 23:13

wedamija approved these changes Apr 15, 2025

View reviewed changes

ceorourke merged commit 37cef4e into master Apr 21, 2025
64 checks passed

ceorourke deleted the ceorourke/dont-group-on-time-tsdb branch April 21, 2025 17:26

ceorourke mentioned this pull request Apr 21, 2025

Num of events in an issue condition using large time period is flaky #83685

Open

ceorourke mentioned this pull request May 2, 2025

An Issue Alert When Condition Doesn't Trigger Reliably #84056

Open

github-actions bot locked and limited conversation to collaborators May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(alerts): Don't group on time #89447

fix(alerts): Don't group on time #89447

ceorourke commented Apr 11, 2025 •

edited

Loading

ceorourke Apr 11, 2025

wedamija Apr 14, 2025

wedamija left a comment

wedamija Apr 14, 2025

wedamija Apr 14, 2025

ceorourke Apr 14, 2025

codecov bot commented Apr 14, 2025 •

edited

Loading

wedamija left a comment

sentry-io bot commented Apr 23, 2025 •

edited

Loading

fix(alerts): Don't group on time #89447

fix(alerts): Don't group on time #89447

Conversation

ceorourke commented Apr 11, 2025 • edited Loading

ceorourke Apr 11, 2025

Choose a reason for hiding this comment

wedamija Apr 14, 2025

Choose a reason for hiding this comment

wedamija left a comment

Choose a reason for hiding this comment

wedamija Apr 14, 2025

Choose a reason for hiding this comment

wedamija Apr 14, 2025

Choose a reason for hiding this comment

ceorourke Apr 14, 2025

Choose a reason for hiding this comment

codecov bot commented Apr 14, 2025 • edited Loading

Codecov Report

wedamija left a comment

Choose a reason for hiding this comment

sentry-io bot commented Apr 23, 2025 • edited Loading

Suspect Issues

ceorourke commented Apr 11, 2025 •

edited

Loading

codecov bot commented Apr 14, 2025 •

edited

Loading

sentry-io bot commented Apr 23, 2025 •

edited

Loading